Attention mechanism

See Transformer model, BERT, Recurrent neural network

Compared with Sentence embedding approach, attention mechanism allows to retain information from longer sentences. The context vector is generated dynamically by having shortcuts to words in the input sentence.

https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html

Variations

Dao2022flashattention

Library and code

Hierarchical Attention Networks

Tutorials and articles

Lil’Log: Attention? Attention!
CMU Neural Nets for NLP 2017 (9): Attention
The math behind Attention: Keys, Queries, and Values matrices
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
- Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch

References

Bahdanau2015: NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE - the earliest attention mechanism in Deep learning?
Attention Is All You Need